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Abstract — A adapted tensor-structured GMRES method for the TT format is proposed 
and investigated. The Tensor Train (TT) approximation is a robust approach to high- 
dimensional problems. One class of problems is solution of a linear system. In this work 
we study the convergence of the GMRES method in the presence of tensor approximations 
and provide relaxation techniques to improve its performance. Several numerical examples 
are presented. The method is also compared with a projection TT linear solver based 
on the ALS and DMRG methods. On a particular sPDE (high-dimensional parametric) 
problem, these methods manifest comparable performance, with a good preconditioner the 
TT-GMRES overcomes the ALS solver. 

Keywords: linear systems and iterative methods and Krylov subspaces and inexact 
methods and structured methods 



1. Introduction 

Solving linear systems arising from problems with many dimensions is a 
computationally demanding task. Such problems are posed, for example, in 
quantum chemistry |l|2j . financial mathematics |3|4| and many others. To 
work with J-dimensional arrays is a challenging problem due to the curse of 
dimensionality |5]: the number of elements of a tensor grows exponentially 
with the number of dimensions d, and so does the complexity to work with 
fully populated tensors. For d of order tens or hundreds some other approaches 
are needed, for example, special low-parametric representations or formats. 
As soon as such a format comes with fast linear algebra algorithms, such as 
additions, Matrix-by- Vector multiplications and scalar products, any of the 
classical iterative methods can be implemented straightforwardly. The first 
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problem is that the effective amount of unknowns, required to store vectors 
involved in computations in a chosen format might grow in general arbitrary 
during the solution process. Second, most of classical methods are proved to 
be convergent in the exact arithmetics, and their behavior with approximate 
computations, arising from the use of formats is under the question. If the 
first issue depends essentially on a problem and has to be considered with 
a strong connection to a particular application, the second one gives more 
chances to be described in quite a general case. As for the GMRES method, 
such consideration will be presented in this paper. 

Several formats have been proposed to represent a tensor in a data-sparse 
way. They include canonical and Tucker formats, the two formats with well- 
established properties and application areas, see the review [6j for more de- 
tails. They have known drawbacks. To avoid these drawbacks, the develop- 
ment of new tensor formats began. In 2009 independently Hackbusch and 
Kuhn and later Grasedyck |7|8] and Oseledets and Tyrtyshnikov [U] proposed 
two (slightly different) hierarchical schemes for the tensor approximation, 
the Tucker and Tree Tucker formats. These formats depend on specially 
chosen dimension trees and require recursive procedures. To avoid the re- 
cursion, it was proposed to use a simple matrix product form of the de- 
composition |10|llj . that was called the Tensor Train format, or simply the 
TT- format. 

A tensor A is said to be in the TT- format, if its elements are defined by a 
formula 

A(h,...,i d ) = Gi(h)...G d (i d ), (1.1) 

where Gk(ik) is an r k-i x r k matrix for each fixed ik, 1 4 ^ n k- To make 
the matrix- by-matrix product in ( |1.1[ ) scalar, boundary conditions ro = r d = 1 
are imposed. The numbers rk are called TT-ranks and Gk(ik) are cores of the 
TT-decomposition of a given tensor. If ^ r^n^ ^ n, then the storage of the 
TT-representation requires ^ dnr 2 memory cells. If r is small, then this is 
much smaller than the storage of the full array, n d . 

One can go even further and introduce the Quantized TT (QTT) format 
|12|13] : if the mode sizes are equal to 2 P , we can reshape a given tensor to 
the 2 x 2 x • • • 2 tensor with higher dimension, but all mode sizes are equal to 
2, and then apply the TT approximation to this new tensor. The storage in 
this case is estimated as &(dr 2 \ogn). 

The TT-format is stable in the sense that the best approximation with 
bounded TT-ranks always exists and a quasioptimal approximation can be 
computed by a sequence of SVDs of auxiliary matrices |10|11] , 

The TT-format comes with all basic linear algebra operations. Addition, 
matrix- by- vector product, elementwise multiplication can be implemented in 
linear d and polynomial in r complexity with the result also in the TT-format 
|10|llj . The problem is that after such operations TT-ranks grow. For ex- 
ample, the TT-ranks of the sum of two tensors are equal (formally) to the 
sum of the TT-ranks of the addends. In the case of the matrix- by- vector 
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product the result is also in the TT-format with the TT-ranks of matrix 
and vector multiplied. After several iterations, the TT-ranks will become too 
large, thus the tensor rounding, or truncation is needed: a given tensor A is 
approximated by another tensor B with minimal possible TT-ranks with a 
prescribed accuracy £ (or a fixed maximal rank R) : 

B = -%, R {A), so that ||A-fi||f e\\A\\ F , and/or rank(fi) < R. 

The rounding procedure in the TT-format can be implemented in 0(dnr 3 ) 
operations |10|11] , 

In this work we implement the adapted GMRES solver using the TT arith- 
metics and truncations. It is worth to mention the previous papers devoted to 
Krylov methods with tensor computations: |14| (a FOM-like method in the 
case of a Laplace-like matrix), [15] (Richardson, PCG and BiCGStab meth- 
ods with application to parametric and stochastic PDEs), and, the closest to 
our work, a projection method for linear systems in the Jff- Tucker format 
which was proposed in |16j . Our GMRES method is slightly different. First, 
the Jff- Tucker method from |16] uses the projectors with equal sizes: 



Ax = b _> W^AV m y = W^b, V m £ C" xm , W m = AV m . 

A very important feature of the GMRES method is the rectangular Hessen- 
berg matrix computed via projections to subspaces with different dimensions: 

H m = V m+ ]AV m . 

(In this sense, the mentioned above method is a certain realization of geo- 
metric minimal residual method, the linear span of (nonorthogonal) vectors 
W m contains all Krylov vectors from V m +i except the first one). Moreover, we 
provide the error and convergence analysis with respect to the tensor rounding 
using the theory of inexact GMRES, and the relaxation strategies to reduce 
TT ranks and improve the performance. A convergence estimate was also 
provided for the tensor CG-type method for the eigenvalue problems in |17| . 
A part of our paper (Section 3| is devoted to the role of Matrix-by- Vector 
and Preconditioner- by- Vector multiplications in approximate computations, 
showing the differences between GMRES and CG methods. 

Note that the particular choice of the TT format in this paper is not 
important for the general theory and is due to the simplicity, convenience and 
robustness of the TT format in a wide class of problems. The performance 
improvements considered below, arising from the inexact Krylov theory, were 
also successfully applied for the tensor version of GMRES in the Tucker format 
by Dmitry Savostyanov. Numerical examples were presented on the Workshop 
on Tensor Decompositions and Applications (TDA 2010), Monopoli, Bari, 
Italy. 
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The rest of the paper is organized as follows. In the next section we briefly 
review the scheme of the GMRES method. In Section |3] we discuss the influ- 
ence of preconditioners, especially in the case of errors caused by tensor round- 
ings via SVD. In Section [4] we explain the inexact GMRES theory, provide 
the error analysis for the approximate TT-GMRES and the whole algorithm. 
And in the last section five we present several numerical examples, and com- 
pare also two methods: TT-GMRES and the DMRG-ALS linear solver from 



2. Exact GMRES method in standard arithmetics 

In this section we briefly recall the GMRES method following [TD]. We are 
going to investigate, how influence the errors arising from the tensor roundings 
to the convergence of the methods. Moreover, if we are solving discretized 
PDEs, we have to consider carefully, which norm of the residual one shall 
use. 

Let us present the basic properties of this method. Suppose we are going 
to solve a linear system 

Ax = b, AGC" XM , x,b£C". 

The method is based on the minimization of the residual functional \\b — Ax\\ 
on the Krylov subspaces: 

je k = {b,Ab,A 2 b,...,A k - l by 

To build the Krylov basis one uses the Arnoldi process (see the algorithm 



2.1), which is nothing else but the Gramm-Shmidt orthogonalization method 



applied to the Krylov vectors. After k steps we have the orthonormal vectors 
V k+ \ and k+ 1 x k matrix H k = \hu\. Now we have to obtain a correction to 
the solution. 

The vectors v,- possess the following property: 

AV k = V k+1 H k . (2.1) 

Suppose the initial guess xq is given, the initial residual ro = b — AxQ. We are 
to solve the least squares problem 

min ||/-A(xo + z)|| = min ||r -Az\\. 

zeX k zeJt k 

Now put z = Vjfcy, reformulate the functional for the vector y, which is small, 
if k < n: 

J(y) = \\p Vl -AV k y\l 



where j8 = \ \ro\\. Taking into account (2.1), we obtain 

J(y) = WVk+lifa -H k y)\\ = \\p ei -H k y\\, 
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since ||Vt+i|| = 1 due to the orthogonality, e\ is the first identity vector of size 
k+\. Now we write the correction to the solution: 

x k = x + V k y k , y k = argmin||/3ei -H k y\\. 

In the Arnoldi process the number of basis vectors grows up to the size 
of a matrix, resulting in a prohibitive amount of memory and computational 
time. To avoid this, one uses GMRES with restarts: every m steps the current 
solution is taken as the initial guess, and the algorithm restarts. The overall 



scheme of the GMRES (m) is given in the Algorithm 2.1 



Algorithm 2.1 GMRES (m) 



Require: Matrix A, right-hand side b, initial guess xq, stopping tolerance e. 
Ensure: Approximate solution x m : \\Ax m — b\\ ^ £. 
1: Start: compute r§ = b — Axq, v\ =ro/||ro||. 
Iterations: 
for j = l,2,...,m do 

hij = (Avj,Vi), i = 1>2,...,7, {Arnoldi process} 



2 
3 
4 

J 

5: Vj+i =AVj- £ hijVi, 
£=1 

6: hj+ij = ||v/+i||, and 

7: V j+1 =V j+1 /h J+1 j. 



9 
10 



12 
13 



end for 

Assemble the matrix H m = [hi j] , j = 1 , ...,m, i= 1 , ... , j + 1 . 

Compute the least-squares solution: y m = argmin||j8ei —H m y\ 

y 

11: x m =x + V m y m . {Update} 

Restart: compute r m =b—Ax m . Stop if ||r m || ^ £. 
Otherwise set xq =x m , v\ = r m /\\r m \\ and go to 2. 



One of the nice properties following from (2.1 ) is a cheap way to compute 
the residual: 

||J3ei-fijtyjfc|| = \\b-A(x + V k y k )\\, 
so we can check the stopping criteria using only small vectors e\,y k and matrix 

For the exact GMRES the following property is shown in [19J: 

Theorem 2.1. The solution Xj, obtained on 7-th GMRES step is exact if 
and only if Vj + \ = 44> hj + \j = 0. 

In the following we will consider the inexact GMRES, for which this theorem 
does not take place. 
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3. Role of preconditioners and smoothers 

A well known way to improve the convergence of an iterative method is to 
use a preconditioner: 

Ax = b — > MAx = Mb, or AMy = b, x = My, 

which may shrink the spectrum of a matrix to a small interval (for the dis- 
cretized PDE problems one usually requires a mesh-independent spectral in- 
terval), or make clusters of eigenvalues (see |20)21| ). 

The first formula is called a left preconditioner, the second is a right one. 
The main difference of these approaches (at least, for symmetric matrices) 
is what residual is computed and considered: the first works with | \MAx — 
Mb\\, i.e. the preconditioned residual, the second - with the real residual 
1 1 Ax — b\\. Usually in the solution process the residual-based stopping criterion 
is used, and in the case of the left preconditioner the residual has to be 
computed explicitly (whereas the norm of the preconditioned residual can be 
computed rapidly in the GMRES method). However, if the preconditioner 
is close enough to the inversed matrix (MA = 1 + E, \\E\\ <C 1), it might be 
worth to use the preconditioned residual, since it provides information on 
the solution error, which is more important (and relevant), than the residual. 
Indeed, 

MAx-Mb = (I + E)x-(I + E)A~ 1 b = {x-A^ 1 b)+Ex-EA^ 1 b nsx-A~ l b. 

In some cases the norm of E might be even greater than 1, but if it does 
not depend on a grid size, the preconditioned residual still gives relevant 
information on the error, in the sense that the constants of equivalence ci||jc — 
A" 1 ^!! ^ ||MAx — Mb\\ ^ C2||x— A -1 ^!! do not depend on h. Moreover, on the 
usual scales of errors arising in tensor arithmetical roundings (10~ 4 — 10~ 6 ), 
the real residual might give no information on the convergence at all. 

So, consider the tensor rounding in the following form. Suppose a tensor 
u is given, and consider a low-rank approximation 



Since the correction £ is composed from the last singular vectors of TT-blocks 
of u, it contains in general harmonics with higher frequencies, than u. Let us 
illustrate it on the following example. Consider a ID function u on the interval 
[—1,1] and its Fourier decomposition: 



u = u + e. 




m=l 



Take a partial sum of this sequence as an approximation. 



R 




m=\ 
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From the Parseval's theorem it is known, that if the coefficients are computed 
as follows: 

(M,cos(7lmx)) (u,sm(nmx)) 



(cos(nmx) ,cos{nmx))' m (sin(7rmx),sin(7rmx)) ' 

then the approximation with harmonic functions u is optimal in the L2-norm. 
The approximation error is written as the following sum: 

DO 

u — u= ^ a m cos ( %mx) + f5 m sin ( %mx) , 

m=R+l 

i.e. it contains harmonics with the frequencies greater than R. The singular 
value decomposition provides the optimal rank-r approximation to a matrix 
in the 2-norm, and the same phenomenon occurs. 

Consider now action of the second derivative operator d 2 /dx 2 to the ap- 
proximated function u. It reads 

d U d 11 i 2 2 / \ 2 In • / \ 

= 2^, 71 m a m cos(7Cmx) + % m p m sm(7Vmx), 



m=R+l 



and 



d 2 u d 2 u 
dx 2 dx 2 



> % 2 R 2 lit* — u\ 



The approximation u in the L2 norm might provide a sufficient accuracy £, 
but the error in the second derivative is in the order of R 2 e, which might 
be prohibitively large. The discretization matrix A of the second derivative 
operator has the norm &(\/h 2 ), so 

||Afi-Aw|| < ||A||e = G\ -^-e) >£. 



h 2 , 

When we consider relative quantities, if the accuracy of the linear system 
solution is \\x— A _1 ft||/||x|| = £, the residual norm might be | \Ax — b\ \/\ \b\ \ = 
i^(cond(A) e). If the tensor rounding accuracy is £ = 10~ 5 , and the problem is 
discretized on 1000 grid points, then cond(A) = <^(10 6 ) and ||Ajc — i>||/||&|| = 
<^(10). The L2-norm accuracy of the order 10~ 5 might be sufficient, but one 
can not control it, as the residual norm is greater than 1. 

So, for tensor iterative methods, use of a preconditioner is important not 
only for the convergence acceleration, but also to keep the equivalence con- 
stants between the error and the residual in the order of 1. Moreover, it is 
important to use the left preconditioner, when we multiply first the stiffness 
matrix on a vector, and then the preconditioner, which reduces the errors in- 
troduced by tensor roundings (smoother). In this sense, the Conjugate Gradi- 
ent method is not very efficient. Indeed, consider the PCG algorithm (it works 
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in terms of the right preconditioner AMM x = b, see e.g. [22J): 

r Q = b-Ax Q , Pl =Mr 
ctj = (ri^i,Mri-i)/(Api,pi) 

Xi=Xi-i+(XiPi 

n = r,-_i -CCjApi 
P i =(r i ,Mrt)/(r i - 1 ,Mr i - 1 ) 
p /+ i =Mri + p iPi . 

In the formulas for OC, and r\ the last operation before the scalar product or 
linear combination is the MatVec with the stiffness matrix A P j, which might 
be computed with a very large error, if P i is rounded in the L2 norm. As a 
result, new iterands are computed with such a large error, and the method 
diverges. In this sense, it is much more efficient to use the GMRES method 
with the left preconditioner, when the next Krylov vector is computed as 
v,-+i = MAv(. Numerical experiments conducted show, that the roundings in 
this operation can be even applied sequentially: 

V/+1 = PefiiM&efiiAvi)) 

without corrupting the final result. It is especially important if the precon- 
ditioner is combined from several matrices, e.g. the preconditioner from |21] . 
when the whole matrix MA can not be computed explicitly due to very high 
ranks, and multiplications are applied during several successive implicit pro- 
cedures, which provide approximate products (for example, the DMRG-based 
TT-MatVec, see [23118] for the DMRG schemes). 

4. Relaxation strategies, inexact GMRES, and the TT-GMRES al- 
gorithm 

The inexact Krylov methods theory [24 25 26j allows us to estimate influence 
of the noise arising from the tensor roundings, on the GMRES convergence. 
Moreover, the performance of tensor methods essentially depends on the TT 
ranks of the intermediate vectors. It turns out in practice, that if we truncate 
all the vectors with the same accuracy, the ranks of the last Krylov vectors, 
being added to the Krylov basis, increase from iteration to iteration. Relax- 
ation strategies, proposed in the papers [24 25 26j allow us to truncate the 
latter Krylov vectors less accurately, than the former ones, thus keeping the 
ranks at almost constant values, or even reducing them on last iterations. 

We can consider roundings as the usual MatVecs with the perturbed mat- 
rix: 

5^(Av) =Av = (A+E)v, 

where E is the error matrix, which, generally speaking, changes each time, 
when the Matrix-by- Vector multiplication is computed, and moreover, might 
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depend on v. Our goal is to estimate allowed values for ||£|| which provide the 
convergence until some desired accuracy. It can be proved, that during the 
Krylov iterations the norm of the error matrix ||£|| can be increased, i.e. the 
vector operations on the last iterations can be computed with worse accuracy. 



A linear system reduction to the Krylov subspace (2.1) for the inexact 
GMRES is written as: 

[(A+£i)vi ••• (A + E m )v m ]=V m+l H m . (4.1) 

Notice that V m is no more a basis in the exact subspace J%„. The al- 
gorithm remains the same, as for the exact GMRES, but the minimization of 
||j3ei —H m y m \\ does not lead any more to the minimization of the real residual 



\\b — Ax m \\. From (4.1) follows, that we are solving the following optimization 
problem: 

min ||r -tf||, r = b - (A + E )x , 

q£R(W m ) 

and R(W m ) is a linear span of the vectors W m = V m +\H m . So, in fact we are 
minimizing the computed approximate residual 

H^mH — H^O *]m\ \ — \hm+\,m^mym\' 

If the new Krylov vector {A+E m )v m on some iteration appears to be almost 
linearly dependent with the basis V m , the quantity /z m +i jm is small, and the 
approximate residual ||r m || is also small, which we can consider as a conver- 
gence of the method. But the real residual ||r m || might be significantly larger, 
and the estimate of the difference ||r m — r m \\ will be considered below. 
We formulate the main theorem which comes from |25|: 



Theorem 4.1. Suppose some £ > is given, for the system Az = tq m 
GMRES iterations are conducted, the computed residual r m was obtained. 
Then if for any i ^ m holds that 

1 1 7-t 1 1 , Gm{Hm) 1 

\\ E i\ \ ^ " ^TT^ TT £ ' 

m |K(-i|| 

where o m {H m ) is a minimal singular value of the Hessenberg matrix of the 
reduced system, for the real residual r m = r$ — Az m the following estimate 
holds: 

\ym ^m\ \ ^ £ • 

We refer for the proof to |25] . 

So, the accuracy of the MatVec computation can be relaxed inversely 
proportional to the current residual, and if the process is stopped (in the 
case of stagnation, or if the computed residual becomes smaller than the 
stopping tolerance), the real residual will differ from the computed one on the 
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quantity not greater in the norm than £, i.e. the convergence of the method 
is controlled. 

In order to obtain scale-independent estimates (i.e. the same for the sys- 
tems Ax = b and aAx = ab), one usually consider the relative residual, and 
the corresponding stopping criteria, for example, 




In the same way one can consider the difference between the real and com- 
puted residuals: \\r m — f m \\/\\b\\ ^ £. In this case the result of Theorem |4 . 1 1 can 
be reformulated for the relative quantities, taking into account that £ = s\\b\\: 




e, 



The minimal singular value of H m can be estimated from the minimal 
singular value of A: 

so we formulate the following relaxation strategy for the MatVec error: 



Corollary 4.1. Suppose m GMRES iterations are conducted. If for any 
i ^ m the relative error introduced in the Matrix-by- Vector multiplication is 
bounded by the following rule: 

\\F,\\ 1 1 

< p (A 1\ 

^ mcond(A) ||rv_i | 1 n,un ' 



than the real relative residual and the computed one are connected with 

I ^ m | ^ | Fin | I p 
MM ^ 1 1 All ~rt.- 



With a good spectrally equivalent preconditioner cond(A) = ^(1) (notice that 
the matrix A is considered to be already left-preconditioned here) , and m = 
<^(1), in this case we can consider (4.2) in the following form: if 



\Ei\ 



1 



ri-l 



then the inexact GMRES will converge to the relative residual not greater 
than 

mcond(A)£. 

This approach will be used in the numerical experiments below. 
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Let us write the final algorithm 4.1 of the tensor GMRES with relaxations. 
Notice also, that in the Arnoldi process we used the modified Gramm-Shmidt 
algorithm, which is more stable in the presence of the rounding errors. In 
addition, as the left preconditioner is used, we do not write it explicitly, but 
assume that the matrix A and the right-hand side b are already precondi- 
tioned. 

One additional thing which is important to note, is when to perform tensor 
rounding, either after adding all the summands in the orthogonalization and 
correction steps, or after each addition. Formally, one can introduce the error 
only to the MatVec itself, but the orthogonality of V m must be kept despite 
the perturbations in (4.1). Moreover, significantly different in magnitude vec- 
tors yj(i)vj are also better to sum exactly. The obvious drawback is the rank 
overhead which can be m times larger than in the case of step-by-step trun- 
cations. So when possible (small number of iterations) it is worth to perform 
only final truncation when the summation is ready (in the case of small mode 
sizes (Quantized TT) it can be easily done by the DMRG truncation (see 
next section) instead of the direct one from [9]). 



5. Fast and accurate TT arithmetics in high dimensions 

One class of interesting high-dimensional problems is the multiparametric 
problems arising in the discretized Karhunen-Loeve model for the PDEs with 
stochastic data. In such problem, the number of parameters is usually in 
the order of tens, and after the tensorisation (Quantisation), the number of 
dimensions is in the order of hundreds. Even for ID physical problem, the 
QTT ranks scale usually linear with the number of parameters, thus keep the 
values 50-100. In this case, the multiply- and-compress strategy fails, because 
of the prohibitive complexity &(dnr 6 ). A better alternative is to use direct 
minimization methods, based on the alternating directions approach, the ALS 
and DMRG (also known as MALS) schemes. There are several papers on the 
linear- and eigenvalue solvers using the DMRG scheme |23)27lll8j . The simple 
approximation problem is discussed in these articles as well, and now there 
is the new one |2H], concerning specially the approximate Matrix- by- Vector 
product. 

Unfortunately, the main disadvantage of all presented TT-DMRG meth- 
ods is the tendency to underestimate ranks in essentially high- dimensional 
problems. Recall briefly the main sketch of the approximation via the DMRG 
(MALS) scheme: 

1. Suppose a functional J(x) to minimize is given (e.g. J(x) = \ \x — y\\ 2 ). 

2. Initial guess for x in the TT format is given: x = X\(i\) ■ ■■X c i{id). 

3. Choose two neighboring cores and convolve a supercore: X^ikjXk+i (4+1 ) — > 
W k (i k ,i k+ i). 
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Algorithm 4.1 Relaxed TT-GMRES(m) 

Require: Right-hand side b, initial vector xq in the TT format, matrix A as 
a tensor MatVec procedure y = ^^(Ax), accuracy £ and/or maximal TT 
rank R. 

Ensure: Approximate solution xj : \\Axj — b\\/\\b\\ ^ £. 
1: Start: compute ro = ,%^{b — Axq)^ j3 = ||ro|| vi = ro//3. 
2: Iterations: 
3: for j = 1,2, ...,m do 

4: Compute the relaxed accuracy 5 = - rrjT,- 

m-iW/p 

5: w = ^^(Avy) - new Krylov vector. 
6: for i = 1,2, ...,_/ do 

7: Af,y = (w,Vj), 

8: w = w — A/,yVi, {orthogonalization} 

9: end for 

10: w = ^§ ,r(w). {compression} 

11: /Jy+ij = | H|, Vy+i _= w/h j+1J . 

12: Assemble matrix //y = [Ajjt], ^ = l,...,j, / = + 1- 

13: Compute a solution of the reduced system: yj = argmin||j3ei — Hjy\\. 

14: Check the residual ||f ; -|| = \ \pe\ —Hjyj\\: if ^ e, then break. 

15: end for 

16: Update the solution: initialize xj = xq, 

17: for i = 1,2, do 

18: xj = xj +yj(i)vt {correction} 

19: end for 

20: Xj = ^e,r{x) {compression} 

21: Restart: if ||r/||/||&|| > e, then set xq =%j, go to 1. 



4. Solve the reduced optimization problem for the elements of W4: = 
argmin/(x). 

Wk 

5. Recover the TT structure (e.g. via SVD): Wk ~ Xk{ik)^k+i{ik+\) ■ 

6. Consider the next pair of cores, and so on.. 

The rank is determined adaptively on the step 5. The ranks are not known 
in general, and we usually start from a low-rank initial guess, subsequently 
increasing them during the DMRG iterations. The problem is that if we are 
using the fixed £-truncation of singular values, the ranks determined become 
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underestimated, as the dimension increases. There are two factors. First, the 
worst-case error accumulation in the whole tensor is d£, if the local errors 
in each block are bounded by £ |9]. Second, instead of direct compression 
routine from jS], where the fixed cores are cores of the initial tensor, here we 
are working with a projection to some tensor with blocks, which are far from 
the good approximation (on early iterations), and moreover, have insufficient 
ranks. To get rid of this, in this work we used the algorithms modified as 
follows: 

• First, set the accuracy for the local truncation to £/ oc = e/d. 

• Second, after the rank is truncated according to £/ oc , artificially add 
more singular vectors (thus obtaining the truncation with increased 
accuracy and rank). This additional rank can even be determined ad- 
aptively, depending on the convergence of the current supercore, by 
comparison with the approximation from the previous iteration. 

The approximation computed this way might have overestimated ranks. To 
reduce them to proper values, it is sufficient to conduct the last iteration with 
the standard truncation without including additional singular vectors (in fact, 
it performs like the direct compression routine, as the proper approximation 
is already achieved on this step, but the complexity is now ff{{r + r ac i c i) i ) 
instead of ^(r 6 ), and the additional rank is usually significantly smaller than 



For the DMRG-solve routine, we will show the role of the increased-rank 
truncation in the next section. But for the approximations and MatVecs in 
the TT-GMRES, we always keep it on. 

6. Numerical experiments 

The TT-GMRES method and the numerical experiments were implemented 
using the routines from TT Toolbox 2.1 (http://spring.inm.ras.ru/osel/) in 
the MATLAB R2009b and conducted on a Linux x86-64 machine with Intel 
Xeon 2.00GHz CPU in the sequential mode. 

6.1. Convection-diffusion (Table [T] Figures [T] - [7| 

The first example is a 3D diffusion-convection problem with the recirculating 
wind 




[ u y= \ = 1, Ug a \{y=l} = 
discretized using the central-point finite difference scheme: 




-A -> -Ah = (— A][) ®/<g>/+/® (— Aj[) <8>/+/®/<g> (— Aj), 
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du „ i du ,, i 

— -». v£ = vj®/®/, -> v£=/®vl®/, 

"2-10 ] r 0.5 

-1 2 -1 -0.5 0.5 

0-12-10 v 1 - 1 -0.5 0.5 

. . .. ' h ~h .. .. .. ' 

-1 2 J [ -0.5 0_ 

h = l/(n+l) is a grid size. The scalar parameter a (diffusion scale) varies 
from 1 to 1/50 in the numerical tests below. 

We use the TT data representation (without the QTT structure), so the 
TT ranks of the stiffness matrix 



1 



- aA h + (diag (l - x 2 ) ® diag (2y) ® i) ■ V x h + (diag ( -2x) ® diag (l - y 2 ) ® i) ■ V y h 

are bounded by 4 (the ranks of —A/, are all equal to 2, see |29|). 

To solve this problem efficiently, we use the inversed discrete Laplacian 
— A^" 1 as a preconditioner (although this is not the optimal preconditioner, and 
the convergence depends significantly on a, the problem is tractable within 
our range of Reynolds numbers). To implement the inversed Laplacian in the 
TT format we used the quadrature from |3U|31| : if 

A/, = Aj[<g>/®---<8)/H h/®---<g>/<S>A^, 

then 

M d 

A h l - E c k 0exp(-t k A l h ), 

k=-M p= l 

where t\ = e kri , c\ = r\t kl T\ = n/y/M, with the accuracy &(e^ n ^) 1 so that 
r A -i = ^(log 2 (l/e)). In practice this formula can be accelerated (giving the 
complexity G{n logn)) by using Fast Trigonometric transforms (in our case 
of Dirichlet boundary conditions the appropriate transform is DST-I) with 
all TT ranks equal to 1 |32j . and compressing only the diagonal matrix with 
inversed eigenvalues. 

The timings of the TT solver are compared with ones of the standard 
full-vector GMRES solver, with the same preconditioner implemented in the 
full format using the trigonometric transforms as well, with the complexity 
<^(n 3 log n). 

The tensor rounding accuracy for the solution is fixed to £ = 10~ 5 , and 
the accuracy for the Krylov vectors is determined according to the relaxation 
strategies. 

First, we check the convergence properties of the preconditioner (Table [TJ 
Fig. Fl]). The number of iterations is stable with respect to the grid size, but 



TT-GMRES: on solution to a linear system in the structured tensor format 



15 



Table 1. Number of iterations versus the grid size (n) and diffusion scale (a) 



n 


1 


1/2 


1/5 


1/10 


1/20 


1/50 


64 


5 


6 


10 


17 


30 


60 


256 


5 


6 


10 


17 


30 


60 



Convergence history, n = 256 




iteration 

Figure 1. Convergence history for the convection example 



grows approximately linearly with the Reynolds number. The convergence 
histories for different a and n = 256 are given on Fig. [T] 

The behavior of the TT ranks during the iterations (we measure here the 
highest rank max r,) of the Krylov vectors and the solution is presented 

i=l,..,d— 1 

on Fig. ^1 p3l respectively. The solution rank grows from 1 (zero tensor) to 



maximal TT rank of Krylov vectors, a — 1/50 maximal TT rank of the solution, a = 1/50 




10 20 30 40 50 60 10 20 30 40 50 60 



Figure 2. Maximal TT rank of the last Figure 3. Maximal TT rank of the solution, 
Krylov vector, convection example convection example 
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its stable value with a weak (approx. logarithmic) dependence on the grid 
size. The Krylov vector rank has its maximum at the middle iterations on the 
finer grids (it is also important, that it grows slightly with the grid size, it 
will be reflected in the computational time), but near the end of the process, 
it begins to decrease due to the relaxed accuracy. 

Now, consider the computational time of the TT-GMRES solver and the 
standard full GMRES method in MATLAB with the same Fourier-based pre- 
conditioner. The CPU time of the TT solver is presented on Fig |4j and the 
log-log scale plot is on Fig. [5] The linear fitting on the log-log plot gives the 



CPU Time (see), TT-GMRES 



CPU Time in the log-log scale, TT-GMRES 





n log 2 (n) 

Figure 4. CPU time (sec.) of the TT solver, Figure 5. CPU time (sec.) of the TT solver 
convection example in the log-log scale, convection example 

experimental complexity rate n . The overhead with respect to the true lin- 
ear complexity appears from the additional logarithmic terms in the Fourier 
transforms and approximately logarithmic grow of the TT ranks of the Krylov 
vectors, see Fig. [2] 

The full solver manifests the complexity rate n , which lies in a corres- 
pondence with its theoretical estimate ^(« 3 logrc) (see Fig. g for the CPU 
time itself, and [7] for the log-log scale) . Notice that the full solver timings are 
presented only for grid sizes not larger than 256. We were not able to perform 
the calculations on the grid 512 due to insufficient memory resources. Nev- 
ertheless, the extrapolation via the linear fit from the Fig.[7]gives an estimate 
2 14 ~ 20000 seconds for that experiment, which is about 15 times larger than 
the corresponding times of the TT solver. 



6.2. ID stochastic (parametric) PDE (Tables [2jj5] Figures |8j [9]) 

In this example we consider a ID stochastic (multiparametric) equation from 
[331: 



ox ox 



f(x) = \ inOxy = [-l,l]x[-l,l] 



(6.1) 
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CPU Time (sec), Ml GMBEK 



CPU Time in the log-log scale, lull GMRES 



= 1/* 

- 1/20 

- I 'II 



2(11111 
1 800 
1600 
1 1011 
12(10 
Eh 1000 

MID 
COO 
400 
20(1 



6.5 7 7.5 8 
n log 2 (") 

Figure 6. CPU time (sec.) of the full solver, Figure 7. CPU time (sec.) of the full solver 
convection example in the log-log scale, convection example 

Dirichlet boundary conditions on d£l, and the coefficient is given as a Karhunen- 
Loeve expansion: 





a(x,y) = a (x) + ]jT JXjaj(x)yj, with 

7=1 



ao(x) = 1, 



1 



2(7+ 1; 



aj(x) = sin(7tjx). 



The problem is then d+ 1-dimensional, and is not tractable in the full format. 
It is again discretized using the FD scheme with the collocation method in 
the parameters on uniformly distributed points. We use the preconditioner 

eh 



P 2 = A~T(l/a)A 



-l 



where T(a) is a stiffness matrix of the discretized elliptic operator (6.1) with 
the coefficient a. The parametric inversed Laplacian reads just A~ l ®I yi ® 
■■■®Iy d - Moreover, we used the QTT format in this example, with the ex- 
plicit analytic QTT representation of the ID A^T 1 from |29| . To compute the 
reciprocal coefficient, we used the TT-structured Newton iterations. In the 
following, unless specially noted (table [5J, we fix the tensor rounding accur- 
acy to £ = 10~ 5 . 

This example is essentially high-dimensional, with large ranks of the solu- 
tion, and what is more important, of the coefficients. Hence we have to use the 
DMRG compression routines. The increased-rank truncation strategy allows 
to keep the accuracy, correspondingly increasing the time. But without it, one 
might get no relevant solution at all. We will demonstrate it in a comparison 
with the DMRG-solve algorithm. 

We show in Tables \2\ Si El the number of iterations (it. 



solution time (T, 

sec), stabilized preconditioned residual (resid.) and the maximal TT rank 
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Table 2. d = 20, £ = 1(T 5 



n x 


n y 


it. 


T (sec.) resid. 


rank 


128 


64 


3 


129.2 3.19e-6 


28 


256 


64 


3 


124.1 


2.93e-6 


28 


128 


128 


3 


133.8 


4.64e-6 


27 


128 


256 


3 


148.3 


4.68e-6 


28 



Table 3. d = 40, e = 10 -5 



n x 


n y 


it. 


T (sec.) resid. 


rank 


128 


64 


3 


413.7 2.13e-5 


33 


256 


64 


3 


409.8 


1.93e-5 


33 


128 


128 


3 


334.7 


1.51e-5 


36 


128 


256 


3 


456.3 


1.90e-5 


33 



Table 4. d = 80, £ = 10 -; 



n x 


My 


it. 


T (sec.) resid. 


rank 


128 


64 


3 


1187 1.71e-5 


37 


256 


64 


3 


1280 


1.70e-5 


36 


128 


128 


3 


1122 


1.82e-5 


35 


128 


256 


3 


1336 


2.03e-5 


33 



versus the spacial n x and parametric n y grid sizes and the number of para- 
meters d. 

Consider a dependence on the tensor rounding accuracy £ in the case 
n x = 128, n y = 64, d = 20. As in the previous tables, we show the number 
of iterations, solution time, stabilized residual and maximal TT rank, see [5] 
With increasing accuracy, the computational time increases drastically, as it 

Table 5. Dependence on the rounding accuracy £. Problem sizes n x = 128, n y = 64, d = 20. 



£ 


io- 3 


io- 4 


10" 5 


10~ 6 


it. 


2 


2 


3 


3 


T, sec. 


7.31 


16.98 


129.2 


384.41 


resid. 


2.05e-4 


7.90e-5 


3.19e-6 


7.31e-8 


rank 


8 


13 


28 


46 
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depends both on the number of iterations and TT ranks. 

Now, consider the TT-DMRG-solver from [18] applied to the same prob- 
lem i^r(a)i< = P%f with n x = 128, n y = 64 and d = 20. Following the section^] 
we compare two variants of rank truncations in the superblock splitting: with 
fixed e and additional rank increasing. The convergence histories are shown 
on Fig. [8j and the cumulative times on Fig. |9l respectively. 



preconditioned residua] vs DMRG iteration 



Total time of DM KG iterations 




in 20 30 40 50 60 70 80 90 100 
ii- t ,ii i .i. 




10 20 30 40 50 60 70 80 90 10(1 
iteration 



Figure 8. Relative residuals, sPDE, DMRG Figure 9. Cumulative times, sPDE, DMRG 
solvers solvers 

We see, that increasing of truncation ranks improves the convergence sig- 
nificantly, despite that the time of each iteration is larger than with the fixed- 
accuracy truncation. 

Notice the difference in time between the GMRES and DMRG solver. To 
achieve the same residual 4 • 10~ 6 GMRES spent 129 sec, whereas DMRG 
(with increased ranks only) took about 300 sec. This shows the advantage 
of rapidly converging GMRES, provided a good preconditioner is given. It is 
natural that the also DMRG-based approximate MatVecs (which are in fact, 
just the DMRG truncations, up to additional structure of TT blocks, provided 
by their construction as Matrix by Vector multiplications) are cheaper than 
the linear system solutions. 



7. Conclusion 

The adapted GMRES method in the TT format for a linear system solution 
was proposed and investigated. For the method presented the error analysis 
and performance improvements obtained with the aid of the inexact Krylov 
methods theory. The numerical experiments show, that the method provides 
a linear with respect to the grid size complexity in the case of TT approxima- 
tion, and even logarithmic complexity with the QTT format. The method was 
compared with the direct ALS/DMRG-type minimization solver for the TT 
format. These methods manifest comparable timings and accuracies, and the 
GMRES method might be recommended in cases, when a good preconditioner 
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is known. 
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